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Techniques  have  been  proposed  in  the  past  for  various  types  of  finite  state  machine  (FSM) 
decomposition  that  use  the  number  of  states  or  edges  in  the  decomposed  circuits  as  the 
cost  function  to  be  optimized.  These  measures  are  not  reflective  of  the  true  logic 
complexity  of  the  decomposed  circuits.  These  methods  have  been  mainly  heuristic  in 
nature  and  offer  limited  guarantees  as  to  the  quality  of  the  decomposition.  In  this  paper 
we  present  optimum  and  heuristic  algorithms  for  the  general  decomposition  of  FSMs  such 
that  the  sum  total  of  the  number  of  product  terms  in  the  one-hot  coded  and  logic  minimized 
submachines  is  minimum  or  minimal.  This  cost  function  is  much  more  reflective  of  the  area 


of  an  optimally  state-assigned  and  minimized  submachine  than  the  number  of  states/edges 
in  the  submachine.  We-fo r-mula t^the  problem  of  optimum  two-way  FSM  decomposition  -t  t> 
one  of  symbolic-output  partitioning  and  show  that  this  is  an  easier  problem  than  optimum 
state  assignment.  Wa-describ^Ta  procedure  of  constrained  prime-implicant  generation  and 
covering  tJwrt^Tcprcscnts  an  optimum  FSM  decomposition  algorithm,  under  the  specified  cost 
function.  Exact  procedures  arc  not  viable  for  large  problem  instances.  We-g retf  a  novel 
iterative  optimization  strategy  of  symbolic-implicant  expansion  and  reduction,  modified  from 
two-level  Boolean  minimizers,  HwrTrcpresents  a  heuristic  algorithm  based  on  our  exact 


procedure.  Reduction  and  expansion  are  performed  on  functions  with  symbolic,  rather 
than  binary-valued  outputs.  We  present  preliminary  experimental  results  tbtrt^llustratc 
both  the  efficacy  of  the  proposed  algorithms  and  the  validity  of  the  selected  cost  function. 
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Abstract 

liavf  been  proposed  in  the  past  for  various  types  of  finite 
state  machine  (FSM)  decomposition  that  use  the  number  of  states  or 
edges  in  the  decomposed  circuits  as  the  cost  function  to  be  optimized. 
These  measures  are  not  reflective  of  the  true  logic  complexity  of  the  de- 
comj»osecl  circuits.  These  methods  have  been  mainly  heuristic  in  nature 
ami  offer  limited  guarantees  as  to  the  quality  of  the  decomposition.  In 
this  paper,  we  present  optimum  and  heuristic  algorithms  for  the  gen¬ 
eral  decomposition  of  FSMs  such  that  the  sum  lolal  of  the  number  of 
product  terms  in  the  one-hot  ended  and  logic  minimized  submachine s  is 
minimum  or  minima/.  This  cost  function  is  much  more  reflective  of  the 
area  of  an  optimally  state-assigned  and  minimized  submachine  than  the 
number  of  states/edges  in  the  submachine.  We  formulate  the  problem 
of  optimum  two-way  FSM  decomposition  as  one  of  symbolic-output  par¬ 
titioning  and  show  that  this  is  an  easier  problem  than  optimum  state 
assignment.  We  describe  a  procedure  of  constrained  prime-implicant 
generation  and  covering  that  represents  an  optimum  FSM  decompo¬ 
sition  algorithm,  tinder  the  specified  cost  function.  Exact  procedures 
are  not  viable  for  large  problem  instances.  We  give  a  novel  iterative 
optimization  strategy  of  symbolic-tmplicanl  expansion  and  reduction , 
modified  from  two-level  Boolean  minimizers.  that  represents  a  heuristic 
algorithm  based  on  our  exact  procedure.  Reduction  and  expansion  are 
performed  on  functions  with  symbolic,  rather  than  binary-valued  out¬ 
puts.  We  present  preliminary  experimental  results  that  illustrate  both 
t he  efficacy  of  the  proposed  algorithms  and  t  he  validity  of  the  selected 
cost  function. 

1  Introduction 

The  area  ami  performance  optimization  of  sequential  circuits  is  recog¬ 
nized  as  a  key  area.  Work  done  in  this  area  has  involved  the  development 
of  algorithms  for  state  assignment  (e  g.  (G),  [8])  and  decomposition  of 
finite  slate  machines  (e.g.  [1 1).  [10]). 

Considerable  progress  has  been  made  in  the  sequential  logic  synthe¬ 
sis  arena  in  the  recent  past.  Heuristic  strategies  for  state  assignment 
targeting  two- level  and  multi-level  logic  implementations  that  achieve 
high-quality  solutions  have  been  developed  (e.g.  (8],  (3)).  State  machine 
factorization  algorillmis  have  been  developed  and  their  relationships  to 
llie  state  assignment  problem  have  been  investigated  in  [4]  [2]. 

In  ibis  paper,  we  address  the  problem  of  t  lie  decomposition  of  sequen¬ 
tial  machines  into  smaller  inleracting  submachines,  so  as  to  optimize 
the  area  and  performance  of  the  resulting  implementation.  Previous 
approaches  (e.g.  (Gj.  [4])  to  finite  state  machine  (FSM)  decomposition 
have  used  the  number  of  stales  and  edges  in  the  resulting  submarines 
as  their  cost  function,  (iiven  that  the  logic  implementation  of  an  FSM  is 
derived  from  its  Slate  Transition  Graph  (STG)  specification  after  slate 
assignment  and  intensive  logic  optimization,  this  coat  function  does  not 
reflect  lb-  true  complexity  of  the  eventual  logic-level  implementation 
and  is.  in  facl ,  far  from  accurate.  Previous  approaches  have  been  mainly 
heuristic  in  nature  and  offer  limited  guarantees  as  to  the  quality  of  the 
final  solution  as  well. 

The  contributions  of  the  work  presented  here  include: 

1.  A  formulation  of  the  optimum  two-way  decomposition  problem  as 
one  of  agmbohe  output  partitioning,  with  an  associated  cost  function 
that  is  much  closer  to  the  final  logic-level  implementation  than  the 
number  of  states/edges  in  the  decomposed  submachines,  namely. 
fhr  si mi  total  of  thr  unmbtr  of  product  terms  in  Me  onr-hot  coded 
anil  logic  minimized  auhinachinm.  This  cost  function  allows  us  to 
predict  the  complicated  effects  of  logic  minimization. 

2.  The  development  of  an  exact  solution  to  the  above  problem  via 
a  method  of  prime  implicanl  generation  and  constrained  covering. 
Exact  methods  for  state  assignment  have  been  proposed  in  (5),  but 
here  we  exploit  llie  fact  that  the  problem  of  two-way  FSM  deconv- 
position  is  easier  than  that  of  stale  assignment.  In  particular,  we 
present  a  polynomial-lime  algorithm  to  check  for  the  validity  of  a 
given  solution  during  oritur  implkant  covering. 
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3.  The  development  of  a  sophisticated  heuristic  optimization  strategy 
that  is  applicable  to  problems  of  any  size.  We  give  a  novel  iteral  ive 
optimization  .strategy  of  aginhnlir  implicanl  rrpansion  and  rrdiic- 
tion.  modified  from  two-level  Boolean  minimizers.  that  represents 
a  heuristic  algorithm  based  on  our  exact  procedure.  Reduction 
and  expansion  are  performed  on  functions  with  symbolic,  rather 
than  binary-valued  outputs.  Many  different  expansion/redurl ion 
heuristics  have  been  implemented  and  evaluated  under  this  global 
strategy. 

We  present  basic  definitions  ill  Section  2.  In  Section  3.  ive  formulate 
llie  decomposition  problem  as  one  of  symbolic  output  partitioning  and 
give  an  exact  procedure  to  solve  it.  We  give  a  theorem  that  proves 
the  correctness  off  the  procedure.  A  heuristic  expaud-reduce  procedure, 
viable  for  large  size  problems,  is  presented  in  Section  I  Preliminary  ex¬ 
perimental  results  on  area  and  performance  optimization.  I  hat  illtisirale 
both  the  efficacy  of  the  proposed  algorithms,  as  well  as  llie  validity  of 
our  cost  function,  are  presented  in  Section  3. 

2  Preliminaries 

A  finite  state  machine  is  represented  by  its  State  Transition  Graph 
(STG)  or  State  Transition  Table  (STT).  G(  I  .  E.  U(EI).  where  I  is 
the  set  of  vertices  corresponding  to  tlie  set  of  stales  ,V.  where  ||S||  is  the 
cardinality  of  the  set  of  stales  of  (he  FSM.  an  edge  ( r, .  i  j )  joins  e,  to 
fj  if  there  is  a  primary  input  that  causes  the  FSM  to  evolve  from  stale 
f,  to  stale  r,.  and  \\(E)  is  a  set  of  labels  attached  to  each  edge,  earli 
label  carrying  the  information  of  the  value  of  the  input  that  caused  the 
transition  and  llie  values  of  the  primary  outputs  corresponding  to  that 
transition. 

A  partition  ir  on  the  set  5  is  a  collection  of  disjoin!  snivels  whose 
set  union  is  5.  The  disjoint  snbsris  are  railed  t  he  blocks  of  a.  A  factor 
is  Nyz  (>  1 )  sets  of  stales  and  all  fanout  edges  from  these  sets  of  si  ates 
in  the  given  machine.  Each  set  of  stales  is  called  an  occurrence  of  the 
factor.  The  maximum  number  of  states  in  any  of  the  ,VB  occurrences 
of  the  the  factor  is  denoted  by  ,Vp. 

Given  a  Stale  Graph  description  of  a  desired  terminal  behavior,  the 
essence  of  the  decomposition  problem  is  to  find  two  or  more  machines 
which,  when  interconnected  in  a  prescribed  way.  will  display  that  ter¬ 
minal  behavior.  We  refer  to  the  individual  machines  that  make  up  t|,r 
overall  realization  as  submachines.  The  machine  that  results  from 
the  interconnection  of  the  submarliines  is  called  the  decomposed  ma¬ 
chine.  By  the  prototype  machine,  we  mean  the  machine  that  was 
used  to  define  the  terminal  behavior  to  lie  realized. 

3  Exact  Procedure  for  2- Way  General  De¬ 
composition 

General  deromposil ions  can  have  various  topologies.  We  are  eoneerned 
with  the  decomposition  topology  of  Figure  I.  where  the  original  ma¬ 
chine.  M .  has  been  derompnsed  into  2  submarliines.  \l,  ami  ,\ I,,  inter¬ 
connected  ill  the  prescribed  way.  The  output  logic  for  the  decomposed 
machine  is  distributed  between  the  two  submarliines.  unlike  in  ((]  where 
a  logic  block  external  to  the  submarliines  was  required  to  generate  the 
primary  outputs. 

Optimal  state  assignment  of  a  machine  corresponds  to  finding  an  op¬ 
timal  multiple  general  decomposition  of  the  machine.  By  multiple  we 
mean  that  more  than  2  submarliines  may  In*  produced,  interacting  in 
much  the  same  way  as  in  Figure  I.  The  problem  of  a  2- way  decomposi¬ 
tion  is  thus  simpler  than  the  state  assignment  problem.  Our  goal  is  to 
provide  exact  or  near-exact  solutions  to  this  problem. 

3.1  Cost  Function 

The  cost  function  for  a  general  deromposition  can  vary  depending  on 
the  eventual  targeted  implementation.  Here,  we  are  concerned  with  two- 
level  implementations.  The  cost  function  used  allows  us  to  decompose 
the  prototy  pe  ma  bine  into  submarliines  such  t  lint  the  sunt  of  the  areas 
of  the  two-level  implementation  of  each  submachine  after  stale  assign¬ 
ment  is  less  than  or  dose  to  the  area  of  the  two-level  implementation  of 
the  prototype  machine  after  stale  assignment.  The  srea  of  the  two-level 
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Figure  |:  General  Decomposition  Topology 


implementation  of  each  submachine  is  always  less  than  the  area  of  the 
two-level  implementation  of  the  prototype  machine.  We  also  find  that 
the  cost  of  the  multi-level  implementation  of  the  decomposed  machine 
obtained  using  this  cost  function  is  almost  always  less  than  the  cost  of 
the  multi-level  implementation  of  the  prototype  machine.  This  implies 
that  an  optimal  decomposition  targeting  a  two-level  implementation  is 
a  good  decomposition  for  the  multi-level  case. 

Consider  the  submachines  in  Figure  I.  Let  the  number  of  product, 
terms  in  the  prototype  machine.  M.  after  one-hot  coding  and  two-level 
Boolean  minimization  be  P.  Let  the  number  of  product  terms  in  the 
submachine*  Mi  and  Mi  after  one-hot  coding  anti  two-level  Boolean 
minimization  be  P,  and  P2.  respectively.  We  deem  a  decomposition  to 
be  optimum  (optimal)  if  P\  +  Pi  is  minimum  (minimal).  Note  that  in 
tile  case  where  no  good  decomposition  can  be  found  Pi  +  P2  ■=  P.  Ill 
this  case,  the  best  decomposition  corresponds  to  a  topological  partition 
of  the  next  state  lines,  which  are  produced  by  one-hot  coding  M  (The 
next-state  lines  in  a  one-hot  coded  machine  cannot  share  logic). 

Since  the  two-level  area  of  each  submachine  obtained  using  this  cost, 
function  is  always  less  than  the  two-level  area  of  the  prototype  ma¬ 
chine.  the  critical  path  of  the  decomposed  machine  in  Figure  1  will  be 
smaller  than  the  critical  path  of  the  prototype  machine  in  the  two-level 
implementation.  To  optimize  the  critical  path  of  the  decomposed  ma¬ 
chine.  the  complexity  of  the  prototype  machine  should  be  uniformly 
distributed  between  the  submachines.  A  modified  cost  function  of  the 
form  Pi  +  Pi  +  o||P,  -  Pj||  characterizes  the  optimality  of  the  de- 
cont|>osition  with  respect  to  timing  also. 


3.2  Decomposition,  Factorization  and  Partitioning 

We  formulate  the  optimum  decomposition  problem  in  the  sequel.  We 
are  given  the  initial  Stale  Transition  Graph  (STG)  of  M.  Assume  M 
has  ,Y  states.  *(.  ..  ,«y.  We  construct  a  function  L  as  follows:  The 
present-state  (PS)  field  in  the  STG  is  replaced  by  an  Af-valued  variable. 
The  next-stale  (NS)  field  in  M  is  split  into  two  symbolic  variables,  i.e. 
*i  is  split  into  symbolic  outputs  sa\  and  sb |.  is  split  into  symbolic 
outputs  sni  and  sbi  and  so  on.  The  primary  input  (PI)  and  output 
(PO)  fields  are  untouched.  Au  example  transformation  is  shown  below: 


list  s'2  10 
00  si  s:i  01 
01  s 2  s2  11 
1 1  s2  *3  00 


1 1  100  so 2  s62  10 
00  100  sail  sb3  01 
01  010  so2  *62  11 
II  010  *63  *63  10 


Consider  the  functions  /, i  and  Li  with  PI  and  PS  fields  that  are  the 
same  as  i.  L\  has  the  first  NS  sub^field,  corresponding  to  the  so,  and 
the  primary  outputs.  Li  has  the  second  NS  sub-field  corresponding  to 
the  sb,.  L i  (left)  and  Li  for  our  example  are  shown  below: 


11  100  sa 2  10  11  100  *62 


00  100  sa3  01  00  100  *63 

01  010  «a2  11  01  010  <62 

II  010  sni  00  11  010  *63 

L i  and  tj  are  topological  partitions  of  the  function  L  (which  cor¬ 
responds  to  the  original  STCI).  but  they  are  also  State  Graplis  of  de¬ 
composed  submachines,  which  together  realize  the  behavior  of  M .  To 
elaborate  on  this,  we  need  to  look  at  possible  encodings  of  the  sa,  and 
sb,.  Obviously,  the  codes  for  all  the  «,  have  to  Ire  distinct.  The  codes 
for  *«t  and  sa i  can  be  the  same  if  and  only  if  the  codes  for  *6,  and  sbi 
are  different. 

Assume  a  one-hot  coding  for  the  symbolic  output  of  L\  (La),  with 
the  extra  degree  of  freedom  that  some  of  the  *o,  (sb,)  can  have  the 


same  code  (Note  that  the  present-stale  field  has  been  replaced  bv  a 
multiple- valued  variable  and  has  not  been  split  as  the  next -state  field 
has).  That  is,  either  the  codes  for  so,  and  su,  are  the  same  or  their 
bitwise  intersection  is  all  zeros.  Let  a  one-hot  code  for  L,  ( l.2)  produce 
Pt  (Pi)  product  terms.  P ,  can  be  changed  only  by  making  some  of 
the  *n,  the  same,  since  a  one-hot  output  coding  is  tile  worst  rase  of 
no  sharing  between  the  eventual  binary-valued  outputs.  If  all  the  sa, 
are  coded  with  the  same  code,  then  we  have  merely  the  POs  to  realize 
and  minimum  cardinality  of  an  encoded  L\  bill  the  *6,  have  all  to  he 
different.  This  will  imply  that  Pt  +  Pi  >  P.  If  one  coded  all  the  *6, 
to  be  the  same,  then  Lj  is  not  required  at  all  (P2  =  0).  but  all  the  <«, 
have  to  be  coded  differently  and  hence  P,  =  P. 

Thus,  the  problem  is  to  decide  which  of  so,  ran  be  coded  the  same 
under  a  one-hot  code  (implying  that  the  corresponding  s6,s  are  coded 
differently  under  a  one-hot  rode),  so  as  to  produce  a  minimum  Pj  -f  P2. 
This  is  identical  to  identifying  two  partitions  [fij  in  the  original  machine 
so  that  the  one-hot  coded  decomposed  machine  corresponding  to  these 
partitions  satisfies  the  cost  function.  When  the  subgraph  associated 
with  the  states  in  one  of  the  blocks  of  *  partition  has  similar  func¬ 
tionality  to  tile  subgraph  associated  with  another  block  of  stales  in  the 
partition,  this  is  exact  ly  the  same  as  identify  ing  the  best  factor  [l]  in  the 
original  machine  across  stales  ..  s\,  such  that  performing  a  one-hot 
coding  on  the  factored  and  factoring  machines  separately  using  differ¬ 
ent  fields,  produces  a  minimum  cumulative  number  of  product  terms. 
If  so*  is  the  same  as  .«>/,  it  means  that  s,  and  x,  are  both  states  in  the 
same  occurrence  of  the  extracted  factor.  If  *6,  is  the  same  as  sh, ,  it 
means  that  s,  and  si  are  (I)  unselected  states  not  it,  the  factor  or  (2) 
correspondence  stales  in  different  occurrences  of  the  factor. 

If  one  wishes  to  constrain  the  decomposition  to  extract  factors  with  a 
maximum  of  A p  stales  in  any  occurrence  of  the  factor,  it  implies  that  a 
maximum  of  Ay  sa,  can  be  given  the  same  code.  If  we  require  a  factor 
with  at  least  Ar  occurrences  it  means  that  there  have  to  exist  at  least 
Nr  grou|>s  of  sa,  such  that  the  sa,  in  each  group  have  the  same  code. 

3.3  Prime  Implicant  Generation  and  Covering 

To  solve  an  output  encoding  problem,  we  have  to  modify  the  prime  im¬ 
plicant  generation  and  covering  strategies  basic  to  Boolean  minimiza¬ 
tion.  We  have  a  simpler  (and  slightly  different )  problem  here  from  the 
classical  output  encoding  problem,  however,  since  we  have  assumed  a 
one-hot  coding  and  the  otdy  degree  of  freedom  is  in  giving  the  same 
code  to  the  symbolic  outputs. 

We  have  the  functions  Li  and  L i  which  have  both  liinarv -valued  and 
a  multiple-valued  input,  a  symbolic  output  and  binary '-valued  outputs 
(in  the  case  of  I. , ).  We  generate  generalized  prime  impliranis  (Gl’ls)  for 
Li  and  Li  much  as  in  the  Quine-Mct.  luskey  procedure  with  additional 
lags  corresponding  to  the  symbolic  output.  Initially,  all  minlerms  have 
tags  corresponding  to  the  symbolic  output  they  assert.  If  a  minlerm 
that  asserts  a  symbolic  output  sol.  merges  with  a  minlerm  asserting 
the  symbolic  output  s«t.  the  resulting  cube  has  both  tags  so,  and  .so/. 
A  cube  cancels  another  rulve  if  anil  only  if  their  tags  are  identical,  their 
multiple-valued  input  parts  are  identical  or  the  multiple-valued  input 
part  of  the  larger  culte  contains  a  one  in  all  positions,  and.  the  binary- 
valued  input  part  of  the  larger  rube  rovers  the  binary-valued  input 
part  of  the  smaller  culte.  Binary-valued  outputs  are  treated  the  same 
as  in  the  Quine-McC  luskey  procedure.  When  no  larger  culte  can  lie 
generated,  we  have  the  set  of  all  GPIs. 

Given  a  set  of  Gl’ls  for  L i  and  Li  one  has  to  perform  the  selection 
of  a  cover  such  that  P\  +  P2  is  minimum.  The  definition  of  a  rover  is 
different  from  classical  minimization,  since  we  have  the  constraint  that 
all  the  s,  have  to  encoded  differently.  A  cover  has  to  contain  all  the 
minlerms  In  L,  (or  Lj).  In  addition,  given  a  set  of  GPIs  for  L\.  namely. 
Cl i  and  a  set  or  GPIs  Gj  for  /. j.  we  have  to  check  to  see  if  we  can  rode 
the  s«,s  and  sli, s  such  that  all  the  .s,s  have  different  codes  and  that  no 
GPI  lta«  a  multiple-valued  input  part  that  violates  the  encodeabilily 
constraints.  Then,  we  ran  construct  two  sitbmachittes  .1/,  and  Mt. 
which  when  oue-hol  coded  would  produce  a  cover  cardinality  of  |G|| 
and  |G'j|.  respectively.  The  minimum  covering  problem  corresponds  to 
finding  a  minimum  |G/|  +  |Gj|.  As  mentioned  earlier,  trade-offs  will 
exist . 

It  remains  to  clearly  define  how  the  constraint  on  distinct  codes  for 
the  *,  affects  the  selection  of  G,  and  Gj.  For  this  we  need  to  iusjteci 
the  symbolic  lags  of  each  of  the  impliranis  in  Gt  and  Gj,  as  well  as 
their  multiple- valued  input  parts.  We  can  stale  the  following: 

1.  If  a  prime  implicant  p  6  G|,  has  a  tag  containing  so,.  sa/.  *«,„. 
then  it  implies  that  *«t,  so/,  so,,,  have  been  given  the  same  code. 

2.  If  a  prime  implicant  p  €  Gj,  has  a  tag  containing  sbi.  sb/.  *6,„. 
then  it  implies  that  sin,,  sbi.  <6,„  have  been  given  the  same  code. 

3.  If  the  multiple- valued  input  part  of  a  prime  impliranl  p  €  G,.  Gj 
has  a  0  in  the  position  corresponding  to  *,  and  Is  in  positions 
corresponding  to  *,.  »„,.  then  it  implies  that  either  the  rode  for 
sa,  is  different  from  the  code*  for  both  srq  and  »«,„  or  that  the 
code  for  sbi  is  different  from  the  codes  for  both  sbi  and  <6m. 
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3.4  Algorithm  for  Encodeability  Check 

Given  llit*  above  relations,  we  have  lo  check  to  see  if  all  the  s,  can  have 
distinct  codes  for  some  select  ion  of  GPIs,  G \  ami  G2.  If  so.  t  lie  selection 
of  (»|  ami  Gi  is  valid.  Else,  the  selection  is  invalid.  The  check  can  be 
accomplished  in  polynomial  time,  via  the  algorithm  described  below. 

1.  Construct  a  graph  where  each  node  is  a  state  s,  and  there  is  an 
edge  with  label  o  from  s,  to  &}  if  they  co-exist  in  the  tag  of  some 
GPI  in  ( V | .  Similar!),  there  is  an  edge  with  label  6  from  s,  Lo  s;  if 
they  co-exist  in  the  l  ag  of  som**  GPI  ill  G 2. 

2.  If  any  Sj  pair  has  edges  with  both  labels  n  and  6,  the  selection 
is  invalid,  exit.  We  call  this  constraint  the  unique  ness  constraint. 

3.  Since  we  attempt  to  identify  partitions  |<>]  in  the  prototype  machine, 
we  impose  a  transitivity  constraint  on  the  graph  constructed  ill 
steps  I  and  2  above.  This  implies  that  if  so,  and  sa;  have  an  edge 
with  label  #»  between  them  and  srtj  and  sttk  have  an  edge  with 
label  n  between  them,  then  s«,  and  sat  must  also  have  an  edge 
w i  1 1 1  label  <i  between  them.  We  define  a  clique  as  a  subgraph  such 
that  each  pair  of  constituent  nodes  is  connected  by  edges  with  the 
saint  label.  Tims,  the  constraint  graph  is  composed  of  a  set  of 
clique*  «ati«f)iug  the  following  properties  if  the  selection  of  GPIs 
does  not  violate  step  2  above: 

•  All  the  edges  in  a  particular  clique  can  have  only  one  type  of 
label.  Thus,  a  clique  can  be  identified  with  a  label. 

•  Two  clique*  with  the  same  label  cannot  have  a  node  in  com¬ 
mon  unless  both  the  cliques  are  contained  in  a  single  large 
clique. 

•  I  wo  cliques  with  a  different  label  can  have,  at  the  most,  one 
node  in  common.  Thus,  any  two  cliques  can  have,  at  the  most, 
one  node  in  common. 

4.  Once  a  graph  that  satisfies  f  he  encodeability  constraints  imposed  by 
(lie  output  part  of  the  GPIs  is  const  rue  ted.  we  check  for  violation*  of 
constraints  imposed  by  the  multiple- valued  input  part  of  the  GPIs. 
Trivial  input  constraints  are  those  with  a  l  in  all  the  positions  of  the 
mult iple- valued  input  part  or  those  with  on!)  one  1  in  the  multiple- 
\ allied  input  pari  (because  s,  cannot  have  the  same  code  as  s;  for 
i  j  l»  virtue  of  steps  I  and  2).  A  selection  of  GPIs  violates 
an  input  constraint  if  and  only  if  there  exists  a  multiple- valued 
input  part  in  one  of  the  selected  GPIs  and  a  pair  of  cliques  in  the 
constraint  graph  such  that  the  following  conditions  are  satisfied: 

•  The  intersection  of  (he  two  cliques  is  non-null. 

•  The  intersection  of  the  two  cliques  corresponds  to  an  s,  such 
that  the  position  associated  with  that  s,  in  the  multiple- valued 
input  part  is  a  zero. 

•  There  is  at  least  one  s,  in  inch  of  the  two  cliques  such  that  the 
position  associated  with  that  st  in  the  mult  iple- valued  input 
part  is  a  one. 

If  no  G  PI  exists  such  that  its  mull  iple- valued  input  part  violates!  lie 
constraints  for  ail)  pair  of  cliques,  the  cover  is  deemed  encodeable. 

3.5  Correctness  of  the  Exact  Algorithm 

Lemma  3.1  Tht  steps  of  th*  e u cod* ability  catch  algorithm  are  neces¬ 
sary  and  sufficient  to  tn*urr  that  the  functionality  of  the  decomposed 
machint  is  identical  to  that  of  the  prototype  machine. 

Proof:  \e rrssity:  The  necessity  of  checking  for  the  constraints  in  the 
algorithm  follows  from  the  description  of  the  constraints  in  the  previous 
sect  ions. 

Sufficiency:  It  ran  be  shown  that  the  functionality  of  the  prototype 
machine  is  maintained  iT  the  satisfaction  of  the  above-mentioned  con¬ 
straints  is  verified.  Hence,  no  other  constraints  need  to  be  checked  for. 

■ 

Lemma  3.2  .1  minimum  cardinality  encodeable  solution  can  he  made 
••p  *  utterly  of  ( t P/s. 

Proof:  Assume  that  we  have  a  minimum  cardinality  solution  with  a 
cul>e  r,  that  is  not  a  GPI.  We  know  that  there  exists  a  GPI  covering  C| 
that  has  ihe  same  lag  as  r j.  that  its  multiple- valued  input  part  is  either 
the  came  as  that  of  rj  or  has  a  J  in  all  it#  positions,  that  its  binary- 
input  part  covers  the  binary-input  part  of  C|  and  that  its  binary-output 
part  covers  the  binary -out  put  of  c\.  Thus,  replacing  <*i  by  the  GPI 
does  not  change  the  functionality,  the  cardinality  or  the  encodeability 
of  die  solution.  Hence,  a  minimum  cardinality  solution  can  be  made  up 
entirely  of  GPI*.  ■ 

Theorem  3.3  Th *  selection  of  a  minimum  cardinality  encodeable  cover 
for  ii  and  1.2  from  the  GPIs  represents  an  eraci  solution  to  the  decom¬ 
position  problem  undfr  the  cost  function  being  used. 

Proof:  The  proof  follows  from  Lemmas  3.1  and  3.2.  ■ 

Thus,  we  have  an  exact  algorithm  for  solving  the  decomposition  prolv 
lent  for  Ihe  chosen  cost-function.  This  algorithm  can  be  extended  to  the 
problem  of  decomposition  into  multiple  com|>oneiit  machines.  The  rea¬ 
sons  that  this  exact  algorithm  may  not  be  viable  for  a  given  problem  are 
that  die  number  of  GPIs  may  be  loo  large  and/or  the  covering  prolv 
|em  may  not  be  solvable  in  reasonable  time.  Therefore,  we  require  a 
heuristic  procedure  to  solve  the  problem. 


4  Heuristic  Procedure  for  2- Way  General 
Decomposition 

The  basic  iterative  strategy  that  has  been  used  successfully  for  the  two- 
level  Boolean  minimization  problem  appears  promising  for  2-way  gen¬ 
eral  decomposition  also.  The  encodeability  requirements  for  G j  and  Gj 
are  defined  in  the  same  manner  as  for  the  exact  procedure.  But.  instead 
of  enumerating  all  the  GPIs.  we  begin  with  a  set  of  GPIs  for  L\  and 
L2  and  attempt  lo  reduce  their  count,  while  maintaining  the  validity 
of  the  GPI  covers.  We  can  perform  operations  similar  to  the  rtdnc* 
and  expand  operations  of  MINI  [7]  and  ESPRESSO  [  1  .*>]  in  an  effort  lo 
minimize  the  cover  cardinalities. 

The  three  basic  steps  in  our  iterative  loop  are  given  below  in  the  order 
ill  which  they  are  carried  out: 

•  Symbolic-reduce 

•  Symbolic-expand 

•  Minimize  covers  and  remove  input  constraints 

The  cost  function  that  we  use  for  the  iterative  procedure  is  the  same  as 
that  used  for  the  exact  algorithm,  namclv.  the  sum  of  the  cardinalities 
of  the  minimized  one-hot  coded  submachine*.  The  *te|>*  in  thi*  loop 
are  repeated  until  the  cost  of  the  solution,  given  In  this  cost  function, 
remains  unchanged  after  a  pass  through  the  L.cp. 

in  the  symbolic-reduce  and  symbolic-expand  steps,  we  attempt  to 
modify  I  lie  symbolic  output  tags  of  I  he  GPIs  t  hat  currently  make  up  the 
covers  Cm  and  G2  so  that  the  possibility  of  obtaining  a  cover  at  the  end 
of  the  minimize  step,  with  a  lower  cardinality  than  the  cardinality  of  die 
cover  at  the  beginning  of  the  current  pass  of  die  loop,  is  increased.  The 
symbolic-reduce  and  symbolic-expand  expand  steps  are  analogous  to  die 
reduce  and  expand  stejis.  respectively,  in  iterative  two-level  Boolean 
minimization.  I'ldike  in  two-level  Boolean  minimization,  we  do  not 
check  that  the  cast  of  the  decomposition  does  not  increase  during  the 
symbolic-reduce  and  symbolic-expand  stefis.  On  the  odier  hand,  die 
symbolic-reduce  and  symbolic-expand  steps  are  carried  out  based  on 
intelligent  heuristics  that  do  usually  lead  to  a  reduction  in  die  cost 
nflr r  every  pass  through  the  loop.  Even  so.  \\e  maintain  a  copy  of  the 
best  solution  obtained  up  to  the  current  point.  The  solution  of  the 
iterative  procedure  is  Ihe  best  solution  obtained  up  to  die  point  when 
the  solution  enters  a  local  minimum  that  the  iterative  procedure  cannot 
climb  out  of.  The  steps  of  the  basic  iterative  loop  are  explainer!  below. 

In  the  minimisation  step,  which  follows  the  s\  mbolic-expand.  a  two- 
level  minimization  of  both  the  covers.  G]  and  G 2.  is  carried  out.  1  hi* 
Step  incorporates  all  the  cnlxMnergmg  that  becomes  possible  as  a  result 
of  Ihe  symbolic-expand.  We  call  the  cover  produced  as  a  result  of  the 
minimization  the  over-minimized  cover  because  the  minimization  is  car¬ 
ried  without  taking  into  account  violation*  of  flic  input  constraints,  and 
may  therefore  not  be  encodeable.  We  unravel  die  multiple-valued  input 
parts  of  the  cubes  to  the  minimum  extent  necessary  to  make  the  over 
minimized  cover  encodeable.  Whenever  an  input  constrain!  violation 
is  detected,  the  cardinality  of  the  cover  has  i <>  be  increased  b\  one  lo 
remove  it.  The  procedure  for  detecting  t lie  input  constraint  violations 
and  removing  them  is  speeded  up  significant!)  b)  the  use  of  intelligent 
pruning  techniques  that  greatly  reduce  the  search  spare. 

Ihe  goal  of  the  symbolic-expand  procedure  i«  to  increase  the  size 
of  the  output  tags  of  the  GPIs  in  eacli  cover  till  some  form  of  pritiialilv 
is  achieved.  We  consider  the  cover  lo  pnmt  when  no  symbol  can  be 
added  to  the  output  lag  of  any  GPI  without  violating  the  uniqueness 
constraint.  Ihe  atomic  operation  in  the  expansion  procedure  inserts 
two  slates  in  the  same  output  tag  of  a  GPI.  checking  while  doing  so. 
that  the  uniqueness  constraint  is  not  violated.  This  atomic  step  can 
have  (wo  major  effects: 

•  It  makes  new  cube  merges  possible. 

•  It  can  result  in  additional  burnt  constraint  violations. 
Symbolic-expand  is  order  dependent .  I'lie  ordering  heuristic  attempts 
to  maximize  the  occurrence  of  the  first  elfect  and  minimize  that  of  the 
second. 

The  symbolic-reduce  operation  transforms  the  prime  cover  into  a 
non- prime  cover.  This  operation  is  essential  to  the  iterative  process  for 
moving  out  of  the  local  minimum  that  it  may  have  entered  following  the 
sv mbolic-expand  anti  minimization  *te|»s.  hi  converting  a  prime  cover 
to  a  non -prime  cover,  the  basic  operation  used  by  symbolic-reduce  is  to 
remove  a  stale  from  the  symbolic  output  lag*  that  it  is  contained  in. 
when  the  tags  contain  more  than  one  stale  in  them,  while  maintaining 
functionality.  The  stales  selected  for  removal  are  those  whose  insertion 
in  the  output  lags  of  GPIs  during  the  symbolic-expand  generated  new- 
input  constraints.  Because  the  input  constraints  are  generated  due  lo 
non  null  intersections  between  cliques  with  different  labels,  we  ensure 
that  after  the  symlwdic- reduce,  the  intersection  between  any  pair  of 
clique*  is  null.  Such  a  cover  is  said  to  be  maximally  irdncrd.  This 
formulation  of  the  sv  mlxdic-reduce  operation  is  order  independent. 

The  ordering  of  slate  pairs  at  the  l>eginiiing  of  the  symbolic-expand. 
which  follows  the  symbolic-reduce  operation,  uses  knowledge  of  the  ex¬ 
istence  of  cliques  that  remained  after  the  symbolic  reduce.  The  new  or¬ 
dering  attempts  to  explore  a  different  direction  for  lire  sv  mbolic-expand 
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‘Averaged  over  example*  with  area  <  1500. 

Whenever  applicable,  the  first  number  in  a  box  corresponds  to 
submachine  l,  the  second  to  submachine  2  and  the  third  is  the 
overall  number  for  the  decomposed  machine. 

Table  1:  Statistics  of  the  Encoded  Decomposed  Machines 
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1  Ar^a  normalized  w  r  l.  multi  level  area  of  the  prototype-machine 

The  misll-woife  pipeline  was  used  to  compute  the  multi-level  areas 

Table  2:  Comparision  of  the  Multi-level  Areas 


that  could  lead  to  a  better  solution  than  the  solution  from  the  previous 
pass  through  the  loop. 

5  Results 

We  have  implemented  the  heuristic  procedure  for  optimal  2-way  general 
decomposition  in  a  program  called  h-decom.  The  input  to  h-decom 
is  a  KiSS-sly le  [8]  Slate  Table  description  of  the  prototype  machine. 
As  output,  h-decom  can  produce  either  a  fully  encoded  decomposed 
machine  or  a  decomposed  machine  in  which  the  PS  inputs  to  the  sub- 
machines  and  their  NS  outputs  are  symbolic.  The  efficacy  of  the  de¬ 
composition  can  be  based  on  a  comparison  of  the  following  criteria: 

•  The  area.*  of  I  he  two-level  (multi- level)  implementation  of  the  en¬ 
coded  submachines  and  of  the  two- level  (multi-level)  implementa¬ 
tion  of  the  encoded  prototype  machine. 

•  flie  areas  of  1  he  two-level  implementation  of  the  encoded  subma¬ 
rines  ami  of  a  vertically  partitioned  two-level  implementation  of 
the  encoded  prototype  machine. 

•  The  areas  of  the  two-level  (multi-level)  implementation  of  the  larger 
of  the  two  encoded  submachines  and  of  the  two-level  (multi-level) 
implementation  of  the  encoded  prototype  machine.  Tlie  two- level 
area  of  the  larger  of  the  two  submachine*  is  an  indicator  of  the 
critical  path  of  the  decomposed  machine. 

•  The  overall  cardinalit  ies  of  th*  two-level  cover  of  the  encoded  and 
minimized  decomposed  submachines  and  of  the  two-level  cover  of 
the  enc<  ued  and  minimized  prototype  machine. 

•  The  total  number  of  inputs  and  outputs  ill  each  encoded  subma¬ 
chine  and  in  the  encoded  prototype  machine. 

The  heuristic  algorithm  in  h-decom  waa  tested  on  a  number  of  bench¬ 
mark  examples.  The  statistics  of  the  examples  are  available  in  the 
public  domain.  A  KISS-style  encoding  strategy  was  used  on  the  sub¬ 
machines  and  tli*  prototype  machines.  It  can  be  observed  from  Table  1 
that  h-decom  is  successful  in  finding  good  2-way  general  decomposi¬ 
tions.  It  should  be  noted  that  the  two-level  area  of  each  submachine  is 
always  substantially  less  than  that  of  the  prototype  machine,  implying 
that  t lie  critical  path  of  the  decomposed  circuit  is  always  less  than  that 
for  the  prototype  circuit.  It  appears  that  the  primary  interest  in  using 
decomposition  tools  in  industry  steins  from  a  need  to  improve  the  per¬ 
formance  of  FSM  controllers,  which  often  dictate  the  required  duration 
of  t  lie  system  flock. 

The  total  two- lev  el  area  of  the  decomposed  machine  is  usually  less 
than  that  of  the  prototype  machine.  Since  the  two  submachines  have 
common  inputs* some  extra  routing  area  is  required,  over  and  above 
flie  PLA  area.  However,  this  extra  area  is  small  in  comparison  to  the 
PLA  areas  and  does  not  offset  the  the  area  gain  via  decomposition.  As 


can  l>e  seen  from  Table  1,  the  number  of  inputs  for  Iwlli  submachines 
need  not  be  equal.  A  submachine  may  be  independent  of  some  of  the 
primary  input*  and  present  state  lines. 

It  is  apparent  from  the  topology  of  Figure  1  that  we  do  not  add  extra 
levels  of  logic  to  the  network.  Thus,  the  reduction  in  area  is  as  a  result 
of  the  same  causes  as  in  the  vertical  partitioning  of  PLAs.  In  general,  it 
is  not  necessary  that  a  good  vertical  partition  should  exist  for  a  PLA. 
Our  method  of  decomposition  eusuies  that  a  good  vertical  partition  does 
exist,  which  implies  that  the  performance  of  the  FSM  can  be  improved 
without  compromising  the  area. 

We  also  report  the  multi-lev ei  areas  for  the  large  examples,  for  which 
a  two-level  implemenlal ion  may  not  be  efficient,  in  Table  2.  It  can 
be  seen  that  the  multi-level  area  of  the  decomposed  machine  is  almost 
always  smaller  than  that  of  the  prototype  machine,  even  though  our 
decomposition  strategy  is  not  geared  specifically  toward  decomposition 
for  optimizing  multi-level  area.  To  obtain  the  multi-level  areas,  decom¬ 
positions  targeting  optimal  performance  were  used  as  starting  points. 
The  results  imply  that  a  good  decomposition  targeting  two-level  area  is 
usually  a  good  decomposition  for  the  multi-level  case. 

These  results  are  significantly  belter  than  those  obtained  via  factor¬ 
ization  [4]. 

6  Conclusions 

We  have  proposed  exact  and  heuristic  algorithms  for  optimum  and  op¬ 
timal  2-way  general  dccomposil ion  of  finite  state  machines.  These  al¬ 
gorithms  are  based  on  a  cost  function  that  is  more  reflective  of  the 
cost  of  the  logic-level  implementation  of  the  decomposed  machine  than 
the  cost  function  used  by  previous  approaches  to  the  decomposition 

roblem.  We  have  implemented  the  heuristic  algorithm  in  the  program 

-decom.  Good  decompositions  were  obtained  using  h-decom  for  a 
large  number  of  benchmark  examples. 
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